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ABSTRACT 

A Bloom filter is a simple space-efficient randomized data 
structure for representing a set in order to support member- 
ship queries. Although Bloom filters allow false positives, 
for many applications the space savings outweigh this draw- 
back when the probability of an error is sufficiently low. We 
introduce compressed Bloom filters, which improve perfor- 
mance when the Bloom filter is passed as a message, and its 
transmission size is a limiting factor. For example, Bloom 
filters have been suggested as a means for sharing Web cache 
information. In this setting, proxies do not share the exact 
contents of their caches, but instead periodically broadcast 
Bloom filters representing their cache. By using compressed 
Bloom filters, proxies can reduce the number of bits broad- 
cast, the false positive rate, and/or the amount of computa- 
tion per lookup. The cost is the processing time for compres- 
sion and decompression, which can use simple arithmetic 
coding, and more memory use at the proxies, which utilize 
the larger uncompressed form of the Bloom filter. 

1. INTRODUCTION 

Bloom filters are an excellent data structure for succinctly 
representing a set in order to support membership queries 
[3]. We describe them in detail in Section 2.1; here, we 
simply note that the data structure is randomized (in that 
it uses randomly selected hash functions), and hence has 
some probability of giving a false positive ; that is, it may 
incorrectly return that an element is in a set when it is not. 
For many applications, the probability of a false positive 
can be made sufficiently small and the space savings are 
significant enough that Bloom filters are useful. 

In fact, Bloom filters have a great deal of potential for dis- 
tributed protocols where systems need to share information 
about what data they have available. For example, Fan et 
al. describe how Bloom filters can be used for Web cache 
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sharing [7]. To reduce message traffic, proxies do not trans- 
fer URL lists corresponding to the exact contents of their 
caches, but instead periodically broadcast Bloom filters that 
represent the contents of their cache. If a proxy wishes to 
determine if another proxy has a page in its cache, it checks 
the appropriate Bloom filter. In the case of a false positive, 
a proxy may request a page from another proxy, only to find 
that proxy does not actually have that page cached. In that 
case, some additional delay has been incurred. The small 
chance of a false positive introduced by using a Bloom filter 
is greatly outweighed by the significant reduction in network 
traffic achieved by using the succinct Bloom filter instead of 
sending the full list of cache contents. This technique is used 
in the open source Web proxy cache Squid, where the Bloom 
filters are referred to as Cache Digests [15, 13]. Bloom filters 
have also been suggested for other distributed protocols, e.g. 
[6, 9, 14], 

Our paper is based on the following insight: in this situation, 
the Bloom filter plays a dual role. It is both a data struc- 
ture being used at the proxies, and a message being passed 
between them. When we use the Bloom filter as a data struc- 
ture, we may tune its parameters for optimal performance 
as a data structure; that is, we minimize the probability 
of a false positive for a given memory size and number of 
items. Indeed, this is the approach taken in the analysis of 
[7]. If this data structure is also being passed around as a 
message, however, then we introduce another performance 
measure we may wish to optimize for: transmission size. 
Transmission size may be of greater importance when the 
amount of network traffic is a concern but there is memory 
available at the endpoint machines. This is especially true in 
distributed systems where information must be transmitted 
repeatedly from one endpoint machine to many others. For 
example, in the Web cache sharing system described above, 
the required memory at each proxy is linear in the number 
of proxies, while the total message traffic rate is quadratic 
in the number of proxies, assuming point-to-point commu- 
nication is used. Moreover, the amount of memory required 
at the endpoint machines is fixed for the life of the system, 
where the traffic is additive over the life of the system. 

Transmission size can be affected by using compression. In 
this paper, we show how compressing a Bloom filter can lead 
to improved performance. By using compressed Bloom fil- 
ters, protocols reduce the number of bits broadcast, the false 
positive rate, and/or the amount of computation per lookup. 
The tradeoff costs are the increased processing requirement 



for compression and decompression and larger memory re- 
quirements at the endpoint machines, who may use a larger 
original uncompressed form of the Bloom filter in order to 
achieve improved transmission size. 


We let / = ^1 — e fcn / m j = (1 — p ) k . Note that we use 
the asymptotic approximations p and / to represent respec- 
tively the probability a bit in the Bloom filter is 0 and the 
probability of a false positive from now on for convenience. 


We start by defining the problem as an optimization prob- 
lem, which we solve using some simplifying assumptions. We 
then consider practical issues, including effective compres- 
sion schemes and actual performance. We recommend arith- 
metic coding [11], a simple compression scheme well-suited 
to this situation with fast implementations. We follow by 
showing how to extend our work to other important cases, 
such as in the case where it is possible to update by send- 
ing changes (or deltas) in the Bloom filter rather than new 
Bloom filters. 

Our work underscores an important general principle for 
distributed algorithms: when using a data structure as a 
message, one should consider the parameters of the data 
structure with both of these roles in mind. If transmission 
size is important, tuning the parameters so that compression 
can be used effectively may yield dividends. 


2. COMPRESSED BLOOM FILTERS: 
THEORY 

2.1 Bloom filters 

We begin by introducing Bloom filters, following the frame- 
work and analysis of [7]. 

A Bloom filter for representing a set S = {si, s 2 , . . . , s TO } of 
n elements is described by an array of m bits, initially all 
set to 0. A Bloom filter uses k independent hash functions 
hi , . . . ,hk with range {0, ... ,m — 1}. We make the natu- 
ral assumption that these hash functions map each item in 
the universe to a random number uniform over the range 
{0 , ... ,m— 1} for mathematical convenience. For each el- 
ement s £ S, the bits hi(s) are set to 1 for 1 < i < k. A 
location can be set to 1 multiple times, but only the first 
change has an effect. To check if an item x is in S, we check 
whether all hi(x) are set to 1. If not, then clearly x is not a 
member of S. If all hi(x) are set to 1, we assume that x is 
in S, although we are wrong with some probability. Hence 
a Bloom filter may yield a false positive, where it suggests 
that an element x is in S even though it is not. For many 
applications, this is acceptable as long as the probability of 
a false positive is sufficiently small. 

The probability of a false positive for an element not in the 
set, or the false positive rate, can be calculated in a straight- 
forward fashion, given our assumption that hash functions 
are perfectly random. After all the elements of S are hashed 
into the Bloom filter, the probability that a specific bit is 
still 0 is 



We let p = e fcn / m . The probability of a false positive is 
then 



Although it is clear from the above discussion, it is worth 
noting that there are three fundamental performance met- 
rics for Bloom filters that can be traded off: computation 
time (corresponding to the number of hash functions fc), size 
(corresponding to the array size m), and the probability of 
error (corresponding to the false positive rate /). 


Suppose we are given m and n and we wish to optimize the 
number of hash functions k to minimize the false positive 
rate /. There are two competing forces: using more hash 
functions gives us more chances to find a 0 bit for an element 
that is not a member of S, but using fewer hash functions 
increases the fraction of 0 bits in the array. The optimal 
number of hash functions that minimizes / as a function of 
k is easily found taking the derivative. More conveniently, 
note that / equals exp(ATn(l — e~ kn ^ m )). Let g = fcln(l — 
e ~kn/my Minimizing the false positive rate / is equivalent 
to minimizing g with respect to k. We find 
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It is easy to check that the derivative is 0 when k = (In 2) • 
( m/n ); further efforts reveal that this is a global minimum. 
In this case the false positive rate / is (1/2) fc = (0.6185) m ^ n . 
In practice, of course, k must be an integer, and smaller k 
might be preferred since they reduce the amount of compu- 
tation necessary. 


For comparison with later results, it is useful to frame the 
optimization another way. Letting / be a function of p, we 
find 

f = (l~p) k 

= (1 _ p)( -ln p )'( m / n > 

From the symmetry of this expression, it is easy to check 
that p = 1/2 minimizes the false positive rate /. Hence 
the optimal results are achieved when each bit of the Bloom 
filter is 0 with probability (roughly) 1/2. 

Note that Bloom filters are highly effective even if m = 
cn for a small constant c, such as c = 8. An alternative 
approach if more bits are available is to simply hash each 
item into O(logn) bits and send a list of hash values. Bloom 
filters can allow significantly fewer bits to be sent while still 
achieving very good false positive rates. 


2.2 Compressed Bloom filters 

Our optimization above of the number of hash functions k 
is based on the assumption that we wish to minimize the 
failure of a false positive as a function of the array size m 
and the number of objects n. This is the correct optimiza- 
tion if we consider the Bloom filter as an object residing in 
memory. In the Web cache application, however, the Bloom 
filter is not just an object that resides in memory, but an 



object that must be transferred between proxies. This fact 
suggests that we may not want to optimize the number of 
hash functions for m and n, but instead optimize the num- 
ber of hash functions for the size of the data that needs to 
be sent, or the transmission size. The transmission size, 
however, need not be m; we might be able to compress the 
bit array. Therefore we choose our parameters to minimize 
the failure probability after using compression. 

Let us consider the standard uncompressed Bloom filter, 
which is optimized for k = (In 2) • ( m/n ), or equivalently 
for p = 1/2. Can we gain anything by compressing the 
resulting bit array? Under our assumption of good random 
hash functions, the bit array appears to be a random string 
of m 0’s and l’s, with each entry being 0 or 1 independently 
with probability 1/2. 1 Hence compression does not gain 
anything for this choice of k. 


situations the transmission size may be more important than 
the uncompressed filter size. 

We may establish the problem as an optimization problem 
as follows. Let z be the desired compressed size. Recall 
that each bit in the bit array is 0 with probability p\ we 
treat the bits as independent. Also, as a mathematically 
convenient approximation, we assume that we have an op- 
timal compressor. That is, we assume that our m bit fil- 
ter can be compressed down to only mH(p) bits, where 
H(p) = —p\og 2 p — (1 — p) log 2 (l — p) is the entropy func- 
tion. Our compressor therefore uses the optimal H(p) bits 
on average for each bit in the original string. We consider 
the practical implications more carefully subsequently. Here 
we note just that near-optimal compressors exist; arithmetic 
coding, for example, requires on average less than H(p) + e 
bits per character for any e > 0 given suitably large strings. 


Suppose, however, we instead choose k so that each of the 
entries in the m bit array is 1 with probability 1/3. Then 
we can take advantage of this fact to compress the m bit 
array and reduce the transmission size. After transmission, 
the bit array is decompressed for actual use. Note that the 
uncompressed Bloom filter size is still m bits. While this 
choice of k is not optimal for the uncompressed size m, if 
our goal is to optimize for the transmission size, using com- 
pression may yield a better result. The question is whether 
this compression gains us anything, or if we would have been 
better off simply using a smaller number of bits in our array 
and optimizing for that size. 


Our optimization problem is as follows: given n and z, 
choose m and k to minimize / subject to mH(p) < z. One 
possibility is to choose m = z and k = (In 2) • (m/n) so that 
p = 1/2; this is the original optimized Bloom filter. Hence 
we can guarantee that / < (0.6185) 2 * * * * * * ^ n . 

We can, however, do better. Indeed, in theory this choice 
of k is the worst choice possible once we allow compres- 
sion. To see this, let us again write / as a function of p: 
f = (1 — p)*- - ln,, )'( m /") subject to m = z/H(p) (we may 
without loss of generality choose m as large as possible). 
Equivalently, we have 


We assume here that all lookup computation on the Bloom 
filter is done after decompression at the proxies. A compres- 
sion scheme that also provided random access might allow 
us to compute on the compressed Bloom filter; however, 
achieving random access, efficiency, and good compression 
simultaneously is generally difficult. One possibility is to 
split the Bloom filter into several pieces, and compress each 
piece. To look up a bit would only require decompressing a 
certain piece of the filter instead of the entire filter, reduc- 
ing the amount of memory required [10]. This approach will 
slightly reduce compression but greatly increase computa- 
tion if many lookups occur between updates. 

To contrast with the original Bloom filter discussion, we 
note that for compressed Bloom filters there are now four 
fundamental performance metrics for Bloom filters that can 
be traded off. Besides computation time (corresponding to 
the number of hash functions k) and the probability of error 
(corresponding to the false positive rate /), there are two 
other metrics: the uncompressed filter size that the Bloom 
filter has in the proxy memory, which we continue to denote 
by the number of array bits m; and the transmission size 
corresponding to its size after compression, which we denote 
by 2 . Our starting point is the consideration that in many 

1 Technically, this is not precisely true, since bits are not 

completely independent: the fact that one bit was set to 1 

affects the probability of other bits being set to 1. Asymp- 

totically (and in practice) this effect is negligible; see, for 

example, [1]. Henceforth we make the simplifying assump- 

tion of independence of the bit values in the array. A more 

precise argument for the interested reader is given in the 

Appendix. 


z In p 

/ = ( 1 - P )-™. 

Since 2 and n are fixed with z > n, we may equivalently 
seek to minimize [3 = f n ^ z . Simple calculations show: 

f3 = (l-p)-TT& 


= exp ( - hr (p) ■ ln(l - p) \ (2) 

P V(- lo g2 e )(P ln P+ (! ~P) ln(l ~p))J 

it is interesting to compare this equation with the equa- 
tion (l);the relevant expression in p shows a similar symme- 
try, here with additional terms due to the compression. The 
value of j3 is maximized when the exponent is maximized, 
or equivalently when the term 


ln(l — p) 


is minimized. Note that 


ln(l — p) hip (1 — p) ln 2 (l — p) pln 2 (p) 


The value of ^ is clearly 0 when p = 1/2, and using sym- 
metry it is easy to check that ^ is negative for p < 1/2 and 
positive for p > 1/2. Hence the maximum probability of a 
false positive using a compressed Bloom filter occurs when 
p = 1/2, or equivalently k = (In 2) • (m/n). 


We emphasize the point again: the number of hash functions 
that minimizes the false positive rate without compression in 
fact maximizes the false positive rate with compression. Said 



in another way, in our idealized setting using compression 
always decreases the false positive rate. 

The argument above also shows that 7 is minimized and 
hence f3 and / are minimized in one of the limiting situations 
as p goes to 0 or 1. In each case, using for example the 
expansion ln(l — x) fa —x — x 2 /2 — x 3 /3 — . . . , we find that 
7 goes to —1. Hence j3 goes to 1/2 in both limiting cases, 
and we can in theory achieve a false positive rate arbitrarily 
close to (0.5) 2 / n by letting the number of hash functions go 
to 0 or infinity. 

It is an interesting and worthwhile exercise to try to under- 
stand intuitively how the expression / = (0.5) 2 /” for the 
limiting case arises. Suppose we start with a very large bit 
array, and use just one hash function for our Bloom filter. 
One way of compressing the Bloom filter would be to simply 
send the array indices that contain a 1. Note that this is 
equivalent to hashing each item into a. z/n bit string; that 
is, for one hash function and suitably large values of 2 and 
m, a compressed Bloom filter is equivalent to the natural 
hashing solution. Thinking in terms of hashing, it is clear 
that increasing the number of bits each items hashes into by 
1 drops the false positive probability by approximately 1 / 2 , 
which gives some insight into the result for Bloom filters. 

In practice we are significantly more constrained that the 
limiting situations suggest, since in general letting p go to 0 
or 1 corresponds respectively to using an infinite number of 
or zero hash functions. Of course, we must use at least one 
hash function! Note, however, that the theory shows we may 
achieve improved performance by taking k < In 2 • ( m/n ) for 
the compressed Bloom filter. This has the additional benefit 
that a compressed Bloom filter uses fewer hash functions 
and hence requires less computation per lookup. Further 
practical considerations are discussed in the next section. 

The optimization framework developed above is not the only 
one possible. For example, one could instead fix the de- 
sired false positive rate / and optimize for the final com- 
pressed size 2. To compare in this situation, note that in 
the limit as the number of hash functions goes to zero the 
compressed Bloom filter has a false positive rate tending to 
(0.5) 2 / n while the standard Bloom filter has a false positive 
rate tending to (0.5)^ mln2) ^™. Hence the best possible com- 
pressed Bloom filter achieving the same false positive rate as 
the standard Bloom filter would have 2 = min 2, a savings 
in size of roughly 30%. Again, this is significantly better 
than what can be realized in practice. 

The primary point of this theoretical analysis is to demon- 
strate that compression is a viable means of improving per- 
formance, in terms of reducing the false positive rate for 
a desired compressed size, or for reducing the transmission 
size for a fixed false positive rate. Indeed, because the com- 
pressed Bloom filter allows us another performance metric, 
it provides more flexibility than the standard original Bloom 
filter. An additional benefit is the compressed Bloom filters 
use a smaller number of hash functions, so that lookups are 
more efficient. Based on this theory, we now consider im- 
plementation details and specific examples. 


3. COMPRESSED BLOOM FILTERS: 
PRACTICE 

Our theoretical analysis avoided several issues that are im- 
portant for a real implementation: 

• Restrictions on m: While the size 2 of the compressed 
Bloom filter may be of primary importance, limita- 
tions on the size m of the uncompressed Bloom filter 
also constrain the possibilities. For example, while the- 
oretically we can do well using one hash function and 
compressing, achieving a false positive rate of e with 
one hash function requires m « n/e, which may be too 
large for real applications. 

Also, it may be desirable to have m be a power of two 
for various computations. We do not restrict ourselves 
to powers of two here. 

• Compression overhead: Compression schemes do not 
achieve optimal performance; all compression schemes 
have some associated overhead. Hence the gain from 
the compressed Bloom filter must overcome the asso- 
ciated overhead costs. 

• Compression variability: Of perhaps greater practical 
importance is that if there is an absolute maximum 
packet size, we generally want that the compressed 
array does not exceed the packet size. Otherwise, 
we may have to send multiple packets, leading to in- 
creased network traffic. (In some situations, we may 
want to bound the number of packets used to send the 
compressed Bloom filter; the idea is the same.) Com- 
pression performance varies depending on the input; 
moreover, if the number of elements n in the set S 
cannot be exactly determined in advance, a mispre- 
diction of n could yield insufficient compression. 

• Hashing performance: Depending on the data and the 
hash functions chosen, real hash functions may behave 
differently from the analysis above. 

The issue of achieving good hashing performance on ar- 
bitrary data sets is outside the scope of this paper, and 
we do not consider it further except to raise the following 
points. First, in practice we suspect that using standard 
universal families of hash functions [5, 12] or MD5 (used 
in [7]) will be suitable. Second, in situations where hash- 
ing performance is not sufficiently random, we expect that 
compressed Bloom filters will still generally outperform the 
uncompressed Bloom filter. The point is that if the false 
positive rate of a compressed Bloom filter is increased be- 
cause of weak hash functions, we would expect the false 
positive rate of the uncompressed Bloom filter to increase 
as well; moreover, since compressed Bloom filters use fewer 
hash functions, we expect the effect will be worse for the un- 
compressed filter. For compressed Bloom filters, however, 
there is the additional problem that weak hash functions 
may yield bit arrays that do not compress as much as ex- 
pected. The choice of parameters may therefore have to be 
tuned for the particular data type. 

For compression issues, arithmetic coding provides a flexible 
compression mechanism for achieving near-optimal perfor- 
mance with low variability. Loosely speaking, for a random 



to bit string where the bit values are independent and each 
bit is 0 with probability p and 1 with probability 1 —p, arith- 
metic coding compresses the string to near mH(p) bits with 
high probability, with the deviation from the average having 
a Chernoff-like bound. For more information on arithmetic 
coding, we refer the reader to [11, 16]. For more precise 
statements and details regarding the low variability of arith- 
metic coding, we refer the reader to the Appendix. We note 
that other compression schemes may also be suitable, in- 
cluding for example run-length coding. 

Given this compression scheme, we suggest the following ap- 
proach. Choose a maximum desired uncompressed size m. 
Then design a compressed Bloom filter using the above the- 
ory using a slightly smaller compressed size than desired; for 
example, if the goal is that the compressed size be z, design 
the structure so that the compressed size is 0.99 z. This pro- 
vides room for some variability in compression; the amount 
of room necessary depends on to. A similar effect may be 
achieved by slightly overestimating n. If our uncompressed 
filter is more than half full of zeroes, then if we have fewer 
than expected elements in the set, our filter will tend to 
have even more zeroes than expected, and hence will com- 
press better. With this design, the compressed filter should 
be within the desired size range with high probability. 

To deal with cases that still do not compress adequately, we 
suggest using multiple filter types. Each filter type t is asso- 
ciated with an array of size to, a set of hash functions, and a 
decompression scheme. These types are agreed on ahead of 
time. A few bits in the header can be used to represent the 
filter type. If one of the filter types is the standard Bloom 
filter (no compression) then the set can always be sent ap- 
propriately using at least one of the types. In most cases two 
types- compressed and uncompressed- would be sufficient. 

3.1 Examples 

We test the theoretical framework above by examining a few 
specific examples of the performance improvements possible 
using compressed Bloom filters. We consider cases where 
eight and sixteen bits are used in the compressed Bloom 
filter for each element; this corresponds to configurations 
examined in [7]. 

Suppose we wish to use at most eight bits per set element in 
our transmission with a Bloom filter; that is, z/n = m/n = 
8. Then using the optimal number of hash functions k = 6 
yields a false positive rate of 0.0216. For k = 5, the false 
positive probability is only 0.0217, so this might be prefer- 
able in practice. If we are willing to allow 14 array bits for 
the uncompressed Bloom filter per set element, then we can 
reduce the false positive rate by almost 20% to 0.0177 and 
reduce the number of hash functions to two while keeping 
the (theoretical) transmitted bits per element z/n below 8, 
as shown in Table 1. 

It is also interesting to compare the standard Bloom and 
the compressed Bloom filter pictorially in this case where 
z/n = 8. In Figure 1 we show the false positive rate as 
a function of the number of hash functions k based on the 
theoretical analysis of Sections 2.1 and 2.2, where we allow k 
to behave as a continuous variable. Note that as the theory 
predicts the optimized uncompressed filter actually yields 


the largest false positive rate once we introduce compression. 

We tested the compressed Bloom filter via simulation. We 
repeated the following experiment 100,000 times. A Bloom 
filter for n = 10,000 elements and to = 140,000 bits was cre- 
ated, with each element being hashed to two positions cho- 
sen independently and uniformly at random in the bit array. 
The resulting array was then compressed using a publicly 
available arithmetic coding compressor based on the work 
of Moffat, Neal, and Witten [4, 11], 2 Using 2 = mH(p) sug- 
gests that the compressed size should be near 9,904 bytes; 
to meet the bound of 8 bits per element requires the com- 
pressed size not exceed 10,000 bytes. Over the 100,000 tri- 
als we found the average compressed array size to be 9,920 
bytes, including all overhead; the standard deviation was 
11.375 bytes; and the maximum compressed array size was 
only 9,971 bytes, giving us several bytes of room to spare. 
For larger to and n, we would expect even greater concen- 
tration of the compressed size around its mean; for smaller 
to and n, the variance would be a larger fraction of the com- 
pressed size. We believe example provides good insight into 
what is achievable in real situations. 

Theoretically we can do even better by using just one hash 
function, although this greatly increases the number of array 
bits per element, as seen in Table 1. 

Similarly, considering the specific case of a Bloom filter where 
z/n = m/n = 16, we would use use eleven hash functions 
to achieve an optimal false positive rate of 0.000459. As 
eleven hash functions seems somewhat large, we note that 
we could reduce the number of hash functions used with- 
out applying compression, but using only six hash functions 
more than doubles / to 0.000935. Table 2 summarizes the 
improvements available using compressed Bloom filters. If 
we allow 28 array bits per element, our false positive rate 
falls about 30% while using only four hash functions. If we 
allow 48 array bits per element, our false positive rate falls 
over 50% using only three hash functions. We simulated 
the case with n = 10,000 elements, to = 480,000 bits, and 
k = 3 hash functions using 100,000 trials. The theoreti- 
cal considerations above suggest the compressed size will be 
19,787 bytes. Over our simulation trials, the average com- 
pressed array size was 19,805 bytes, including all overhead; 
the standard deviation was 14.386 bytes; and the maximum 
compressed array size was only 19,865 bytes, well below the 
20,000 bytes available. 

We have also tested the case where z/n = m/n — 4 against 
using m/n = 7, or seven array bits per table. We expect this 
case may prove less useful in practical situations because 
the false positive rate is so high. In this case using the 
standard Bloom filter with the optimal three hash functions 
yields a false positive rate of 0.147; using m/n = 7 and one 
hash function gives a false positive rate of 0.133. Again, we 
performed 100,000 random experiments with n = 10, 000. 
The largest compressed filter required 4,998 bytes, just shy 
of the 5,000 byte limit. 


2 We note that this is an adaptive compressor, which bases 
its prediction of the next bit based on the bits seen thus far. 
Technically it is slightly suboptimal for our purposes, since 
we generally know the probability distribution of the bits 
ahead of time. In practice the difference is quite small. 
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Figure 1: The false positive rate as a function of the number of hash functions for compressed and standard 
Bloom filters using 8 bits per element. 
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Figure 2: The false positive rate as a function of the number of hash functions for compressed and standard 
Bloom filters using 16 bits per element. 


Array bits per element 

m/n 

8 

14 

92 

Transmission bits per element 

z/n 

8 

7.923 

7.923 

Hash functions 

k 

6 

2 

1 

False positive rate 

f 

0.0216 

0.0177 

0.0108 


Table 1: At most eight bits per item (compressed). 


Array bits per element 

m/n 

16 

28 

48 

Transmission bits per element 

z/n 

16 

15.846 

15.829 

Hash functions 

k 

11 

4 

3 

False positive rate 

f 

0.000459 

0.000314 

0.000222 


Table 2: At most sixteen bits per item (compressed). 

















































Array bits per element 

m/n 

4 

7 

Transmission bits per element 

z/n 

4 

3.962 

Hash functions 

k 

3 

1 

False positive rate 

f 

0.147 

0.133 


Table 3: At most four bits per item (compressed). 


Array bits per element 

m/n 

8 

12.6 

46 

Transmission bits per element 

z/n 

8 

7.582 

6.891 

Hash functions 

k 

6 

2 

1 

False positive rate 

f 

0.0216 

0.0216 

0.0215 


Table 4: Maintaining a false positive rate around 0.02. 


Array bits per element 

m/n 

16 

37.5 

93 

Transmission bits per element 

z/n 

16 

14.666 

13.815 

Hash functions 

k 

11 

3 

2 

False positive rate 

f 

0.000459 

0.000454 

0.000453 


Table 5: Maintaining a false positive rate around 0.00045. 


As previously mentioned, we may also consider the opti- 
mization problem in another light: we may try to maintain 
the same false positive ratio while minimizing the transmis- 
sion size. In Tables 4 and 5 we offer examples based on this 
scenario. Our results yield transmission size decreases in the 
range of roughly 5-15% for systems of reasonable size. Here 
again our simulations bear out our theoretical analysis. For 
example, using n = 10,000 elements, m = 126,000 bits, and 
k — 2 hash functions over 100,000 trials, we find the average 
compressed filter required 9,493 bytes, closely matching the 
theoretical prediction. The largest filter over the 100,000 
trials required 9,539 bytes. 

4. DELTA COMPRESSION 

In the Web cache sharing setting, the proxies periodically 
broadcast updates to their cache contents. As described in 
[7], these updates can either be new Bloom filters or rep- 
resentations of the changes between the updated filter and 
the old filter. The difference, or delta, between the updated 
and old filter can be represented by the exclusive-or of the 
corresponding bit arrays of size m, which can then be com- 
pressed using arithmetic coding as above. For example, one 
may decide that updates should be broadcast whenever 5% 
of the underlying array bits have changed; in this case, the 
compressed size of the delta would be roughly mff( 0.05). 
Hence one may wish to optimize the array size for a target 
size of the compressed delta and allow the one-time cost of 
longer initial messages to establish a base Bloom filter at 
the beginning. It makes sense to cast this problem as an 
optimization problem in a manner similar to what we have 
done previously. As we will show, using compressed Bloom 
filters in conjunction with delta compression can yield even 
greater performance gains. 

We emphasize that using delta compression may not be suit- 
able for all applications. For example, sending deltas may 
not be suitable for systems with poor reliability; a missed 
delta may mean a proxy filter remains improperly synchro- 
nized for a long period of time (assuming full filters are 
sent occasionally to resynchronize). In many cases, how- 
ever, sending deltas will be preferred. 


Suppose that our set S of elements changes over time through 
insertions and deletions, but the size is fixed at n elements. 
We send a delta whenever a fraction c of the n entries of the 
set have changed. We consider the case where our goal is to 
minimize the false positive rate / while maintaining a spe- 
cific size for the delta. We again have the power to choose 
the array size m and the number of hash functions k, given 
n and the compressed delta size, which we denote here by 
2 . In this setting we let q be the probability that a bit in 
the delta is a 1 given that a fraction c of the n entries have 
changed. Similar to the case for compressed Bloom filters, 
our constraint is 2 = mH(q). As before we let p — e ~ kn / m 
be the probability a bit in the Bloom filter is 0. 

We determine an expression for q in terms of other param- 
eters. A bit will be 1 in the delta in one of two cases. In 
the first case, the corresponding bit in the Bloom filter was 
originally a 0 but became a 1 after the entries changed. The 
probability that the bit was originally 0 is just p; the prob- 
ability that the cn new entries fail to change that bit to 
a 1 is (1 — 1 /m) crLk ss e -cnfc / m ; so (using the asymptotic 
approximation) the overall probability of this first case is 
p(l _ e - cnfc/m ). 

In the second case, the corresponding bit in the Bloom filter 
was originally a 1 but became a 0 after the entries changed. 
This is equivalent to the previous case. The probability that 
the bit is 0 at the end is just p; the probability that the cn 
deleted entries failed to set that bit to 1 was (1 — 1 /m) cnk , 
and the overall probability of this case is also p(l — e _cnfc / m ). 
Hence q = 2p(l - e ~ cnk/rn ) = 2p(l - p c ). 

The false positive rate satisfies 

/ = (1 - p ) k = (l-p )(- lnpH m/n) = (l-p)Wf57 . 

Since z and n are given minimizing / is equivalent to mini- 
mizing 

- In(p) ln(l-p) 

/ 3 = e h(2 P (i- p c)) > 

Unfortunately, we have lost the appealing symmetry of the 












































Array bits per element 

m/n 

8 

12 

32 

13 

Transmission bits per element 

z/n 

1.6713 

1.6607 

1.6532 

1.3124 

Hash functions 

k 

5 

3 

2 

2 

False positive rate 

f 

0.0217 

0.0108 

0.00367 

0.0203 


Table 6: Comparing the standard Bloom filter and compressed Bloom filters with delta encoding, c = 0.05. 


Array bits per element 

m/n 

8 

16 

48 

13 

Transmission bits per element 

z/n 

0.4624 

0.4856 

0.4500 

0.3430 

Hash functions 

k 

5 

3 

2 

2 

False positive rate 

f 

0.0217 

0.0050 

0.00167 

0.0203 


Table 7: Comparing the standard Bloom filter and compressed Bloom filters with delta encoding, c = 0.01. 


standard and compressed Bloom filter, making analysis of 
the above expression unwieldy. The value of j3 still appears 
to be minimized as p — > 1 for any c < 1/2, but a simple 
formal proof appears challenging. 

It is worthwhile to again consider how / behaves in the 
limiting case as p — > 1. Algebraic manipulation yields that 
in this case j3 — > (l/2) 1 ^ 2c , so / approaches (0.5) 2 ^ 2cn . This 
result is intuitive under the reasoning that the limiting case 
corresponds to hashing each element into a large number of 
bits. The exponent is z/2cn instead of z/n since the updates 
represent both deletions and insertions of cn items; half of 
the bits sent describe the array elements to be deleted. 

We present some examples for results in this setting in Ta- 
bles 6 and 7. It is important to note that in these tables, the 
false positive rate / is given for the Bloom filter in isolation; 
additional false positives and negatives may arise because 
the filter has changed since the last delta has been sent. 
Also, as before these tables are based on the analysis above 
and do not take into account compression overhead and vari- 
ability, which tend to have a greater effect when the number 
of transmitted bits is smaller. 

In Table 6, we consider the case where 5% of the set elements 
S change between updates. A standard Bloom filter using 
8 bits per element and 5 hash functions uses only about 
1.67 bits per item when using delta compression. Alterna- 
tive configurations using more array bits per item and fewer 
hash functions can achieve the same transmission size while 
dramatically reducing the false positive rate /. Using four 
times as much memory (32 bits per element) for the decom- 
pressed filter lowers / by a factor of six. The scenario with 
m/n = 32 and k = 2 hash functions was tested with sim- 
ulations. Over 100,000 trials, the average compressed filter 
required 2,090 bytes, closely matching the theoretical pre- 
diction of 2,066.5 bytes. The maximum size required was 
2,129 bytes. Alternatively, one can aim for the same false 
positive ratio while improving compression. As shown in 
the last column of Table 6, one can achieve the same false 
positive ratio as the standard Bloom filter while using only 
about 1.31 per item, a reduction of over 20%. 

With more frequent updates, so that only 1% of the ele- 
ments change between updates, the transmission require- 
ments drop below 1/2 of a bit per item for a standard Bloom 
filter. As shown in Table 7, substantial reductions in the 
false positive rate or the bits per item can again be achieved. 


5. COUNTING BLOOM FILTERS 

In [7], the authors also describe an extension to a Bloom 
filter, where instead of using a bit array the Bloom filter 
array uses a small number of bits per entry to keep counts. 
The j th entry is incremented for each hash function hi and 
each element x represented by the filter such that hi(x) = j. 
The counting Bloom filter is useful when items can be deleted 
from the filter; when an item x is deleted, one can decrement 
the value at location hfix) in the array for each of the k 
hash functions, i.e. for 1 < i < k. We emphasize that these 
counting Bloom filters are not passed as messages in [7]; 
they are only used locally. 

We note in passing that if one wanted to pass counting 
Bloom filters as messages, compression would yield sub- 
stantial gains. The entropy per array entry would be much 
smaller than the number of bits used per entry, since large 
counts would be extremely unlikely. Our optimization ap- 
proach for finding appropriate parameters can be extended 
to this situation, and arithmetic coding remains highly ef- 
fective. We expect that similar variations of Bloom filters 
would benefit from compression as well. 

6. CONCLUSIONS 

We have shown that using compression can improve Bloom 
filter performance, in the sense that we can achieve a smaller 
false positive rate as a function of the compressed size over 
a Bloom filter that does not use compression. More gener- 
ally, this is an example of a situation where we are using 
a data structure as a message in a distributed protocol. In 
this setting, where the transmission size may be important, 
using compression affects how one should tune the param- 
eters of the data structure. It would be interesting to find 
other useful examples of data structures that can be tuned 
effectively in a different manner when being compressed. 

Our work suggests several interesting theoretical questions. 
For example, our analysis depends highly on the assumption 
that the hash functions used for the Bloom filter behave like 
completely random functions. It is an open question to de- 
termine what sort of performance guarantees are possible 
using practical hash function families. Also, it is not clear 
that the Bloom filter is necessarily the best data structure 
for this problem; perhaps another data structure would al- 
low even better results. 

Finally, we have not yet implemented compressed Bloom fil- 
ters in the context of a full working system for an application 




















































such as distributed Web caching. We expect that significant 
performance improvement will occur even after minor costs 
such as compression and decompression time are factored 
in. The interaction of the compressed Bloom filter with a 
full system may lead to further interesting questions. 
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Appendix: Mathematical details 

In this section we briefly discuss some of the mathematical 
issues we have glossed over previously. Specifically, we wish 
to show that the size of a compressed Bloom filter is very 
close to mH(p) with high probability. We sketch the ar- 
gument, omitting the fine detail and focusing on the main 
points. 

We have calculated the expected fraction of 0 bits in a 
Bloom filter with m bits, k hash functions, and n elements 
is p' = ( 1 — l/m) nk . We proceeded as though the bits in the 
Bloom filter were independent with probability p = e - nk / m _ 
The difference between p' and p is well known to be very 
small, as (1 — 1 /m) m converges quickly to 1/e. We ignore 
this distinction subsequently. The bits of the Bloom filter, 
however, are also not independent. In fact, as we describe 
subsequently, for arithmetic coding to perform well it suf- 
fices that the fraction of 0 bits is highly concentrated around 
its mean. This concentration follows from a standard mar- 
tingale argument. 

Theorem 1. Suppose a Bloom filter is built with k hash 
functions, n elements, and m bits, using the model of per- 
fectly random hash functions. Let X be the number of 0 bits. 
Then 

-e 2 m 2 

Pr[|X — mp\ > em] < 2e 2nfl . 

Proof. This is a standard application of Azuma’s in- 
equality. (See, e.g., [2, Theorem 2.1].) Pick an order for 
the elements to be hashed. Let Xj be the expected num- 
ber of 0 bits after j hashes. Then Xq, X\, . . . ,A'„fc = X 
is a martingale, with \Xi — Xj+i| < 1. The theorem then 
follows. □ 

For our arithmetic coding, we suppose that we use an adap- 
tive arithmetic coder that works as follows. There are two 
counters, Co and Ci; Ci is incremented every time the bit 
value i is seen. Initially the Ci are set to 1, to avoid division- 
by-zero problems below. The encoder and decoder use the 
model that the probability the next bit is i is to be Ci / (Ci + 



Ci-i) to determine how to perform the arithmetic coding. 
(So initially, when no information is given, the encoder and 
decoder assume the first bit is equally likely to be a 0 or a 
1 -) 


Recall that for arithmetic coding the total length L of the 
encoding is the logarithm of the inverse of the product of 
the model probabilities for each bit. In this case, if there 
are m bits total and x of the bits are 0, regardless of the 
position of the x 0 bits the total length L of the encoding 
satisfies 


L = 


log 2 


(m + 1)! 
x\(m — *)! 


(See, for example, [8].) We consider the case where x = pm 
for some constant p. Simplifying, we have 


L = 


log. 
= log 2 


(m + 1)! 


2 1 - p)m)\ 


+ O(logm) 


m 
pm 

= rnH ( p ) + O (log m) . 


In the above we have used the approximation 


m 

pm 


2 mH (p)+0(log m) 


which follows by Stirling’s formula for a constant p. 


Since p is with high probability close to p', which is very 
close to p, the total number of bits used by the encoding is 
close to mH(p) with high probability. 



